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On the Analysis of Shannon-Kotel'nikov Mappings 

Pal Anders Floor and Tor A. Ramstad 



Abstract — In this paper an approach to joint source-channel 
coding (JSCC), named Shannon-Kotel'nikov mappings (SK- 
mappings), is presented. SK-mappings are (piecewise) continuous 
direct source-to-channel mappings operating directly on ampli- 
tude continuous discrete time signals. 

A theory for calculating and categorizing the end-to-end dis- 
tortion when using SK-mappings for communication is presented. 
The theory presented is a generalization of Kotel'nikovs theory 
on 1:7V bandwidth expanding modulation. The proposed theory 
is further used to show that SK-mappings have the potential to 
reach the information theoretical bound OPTA (optimal perfor- 
mance theoretically attainable), by letting the dimensionality of 
the mappings go towards infinity. 

Index Terms — Joint source-channel coding, Shannon- 
Kotel'nikov mappings, analog information sources, OPTA, 
geometry, asymptotic analysis. 



I. Introduction 

THIS paper deals with error reduction and compression of 
analog sources of information. Both operations are done 
by what we have chosen to name Shannon-Kotel'nikov map- 
pings (SK-mappings), which is one approach to joint source- 
channel coding (JSCC) realized as (piecewise) continuous 
direct source-to-channel mappings. 

Separation of source and channel coders was proven to 
be optimal by Shannon [1| (lossless source coding case) and 
Berger (lossy source coding) for communication of a single 
source over point-to-point channels, in the separation theorem. 
Looking into this theorem and its proof, it actually states that 
one can do source and channel coding separately, without 
any loss (compared to a joint technique), by using optimal 
codes. This, and the benefits it brings concerning interfaces, 
is probably one of the major reasons why most communication 
system today uses separate source and channel coding (SSCC). 
But the drawback is that separation is optimal, in general 
for point-to-point communication, only by introducing infinite 
complexity and delay into the system (codes proven to be 
optimal have infinite length). There exists no similar proof that 
separate source and channel coders can achieve optimality if 
constraining e.g. system delay. 

Noise free analog signals contain infinite information (3] 
pp. 228-229]. Multimedia sources like speech, images, audio 
and video all generate analog signals. To be able to com- 
municate such signals some distortion has to be introduced 
(except at infinite capacity, which is not realizable), either 
from channel noise or from both channel noise and lossy 
compression. To communicate analog source signals reliably 
over channels with finite capacity some sort of reduction of 
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information must occur, meaning that some distortion must 
be introduced prior to transmission. This distortion, for a 
given rate, can not be smaller than the distortion given by 
the distortion-rate function of the source in question. Even so, 
infinite complexity and delay is needed to reach the distortion- 
rate bound (e.g. by infinite dimensional vector quantizers). 

Taking the above discussion into consideration, what per- 
formance measure should be chosen in order to determine 
how well a communication system for transmission of analog 
source signals preform? It seems natural to compare it to a 
bound giving the best possible fidelity for a given channel 
signal-to-noise ratio (CSNR). One such bound is the optimal 
performance theoretically attainable (OPTA) J4)- 

A. OPTA 

OPTA renders the best possible received signal-to-distortion 
ratio (SDR) as a function of the CSNR [4]. The discussion here 
is limited to OPTA for the case of a discrete time memoryless 
Gaussian source and a discrete time AWGN channel. This 
also serves as a lower bound for other memoryless sources 
and channels as well as correlated sources and channels with 
memory. 

The rate-distortion function for a Gaussian source with 
bandwidth W s , given the mean-squared error (MSE) distortion 
measure is (3) 



R(D t 



max 



I ) 



(1) 



where a\ is the signal power, D t is the distortion and the ratio 
a\jD t is the SDR. The rate is in nats per second (using the 
natural logarithm). The channel capacity of a Gaussian channel 
with bandwidth W c , and with an average power constraint 
(power per channel sample) is [3) 



C = W c In 1 



P 

7^2 



(2) 



where P is the average channel power, is the channel noise 
power and P/cr^ is the CSNR. The rate is in nats per second. 

To find OPTA, one equates the source rate and channel 
capacity R = C. Solving this for the SDR, OPTA is obtained 

as 



D, 



W c /W 3 



(3) 



The channel/source bandwidth ratio W c /W s can in principle 
take on any real positive number. If W c > W s , redundant 
bandwidth is available for communication, and could be used 
for error reduction. If W c < W s , the source bandwidth and 
hence the information has to be reduced by some sort of lossy 
compression before transmission. 
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The bandwidth ratio W c /W s in ([3]) can in practice be 
obtained by e.g. combining M source samples into N channel 
samples. Assuming Nyquist sampling and an ideal Nyquist 
channel W c /W s ~ N/M = r. Notice that r is a positive 
rational number in this case, i.e. r £ Q+. Both the source- and 
channel spaces can be considered Euclidian with dimension M 
and N, respectively, and so r is called the "dimension change 
factor" (or expansion/reduction factor depending on the case 
under consideration). In the following, an operation where a 
source of dimension M is mapped onto a channel of dimension 
N is referred to as an M:N mapping. 

B. Joint Source-Channel Coding 

We have stated that noise will always be present when 
communicating analog signals. The open question is how the 
noise optimally should be distributed for the source represen- 
tation and the transmission operation. Or more specifically, 
can we improve system performance, given constraints such as 
overall delay and complexity, by introducing greater flexibility 
in allowing the distortions to appear wherever they harm the 
least. I.e. will JSCC give better performance than SSCC in the 
cases of finite complexity and delay? 

Some specific examples illustrate that JSCC is an interesting 
alternative to SSCC in communicating analog sources: It has 
been proven in JH, |5j that for an independent and identically 
distributed (i.i.d.) source and an additive white Gaussian 
noise (AWGN) channel, both of the same bandwidth, OPTA 
is achieved by a direct source-channel mapping (i.e. a low 
complexity delay free mapping). This has been generalized in 
J6) to special combinations of correlated sources and channels 
with memory. Furthermore, it has been shown in Q that 
OPTA can be reached also in the bandwidth expansion case 
(channel bandwidth larger than the source bandwidth), by the 
use of a system with a noiseless feedback link. With non-ideal 
feedback, results approaching OPTA can be obtained provided 
the feedback channel has a significantly higher CSNR than the 
forward channel |8). Notice that optimal systems for other 
expansion factors than N E N have not yet been found, 
neither have optimal systems for bandwidth reduction been 
found, except for some very special cases (6) (at least to 
our knowledge). Of course, if a well performing feedback 
channel is not available, the system in Q, ID can not be used, 
and other types of techniques (generally nonlinear) must be 
considered. Notice that the above mentioned systems, having 
good/optimal performance at very low complexity are purely 
analog. 

During the last decades, when most communication systems 
have become digital, research has mostly been focused on 
coding and transmission of digital signals, i.e. operations on 
amplitude- and time discrete signals. The reasons for studying 
amplitude continuous systems further are primarily their ro- 
bust character against varying channel conditions, something 
which digital systems normally lack, and their potential low 
complexity and reduced delay (for a given performance). 
Little is known about how to construct analog nonlinear 
systems in general. If we consider the case of direct source- 
to-channel mappings (mapping between spaces of different 



dimensions in general) some general "rules of thumb" should 
be considered JU pp. 102-104]: 

1. Mapping distortion (for dimension reducing mappings): 
Mapping a source space of high dimension to a channel space 
of lower dimension creates distortion unless noise is absent 
(George Cantor found a way of doing such a mapping in 
one-by-one manner using space-filling curves iflOl . but this 
technique is impossible to use when noise is added). To 
minimize the effect of mapping distortion, the mapping must 
cover the entire source space, such that every source vector 
have a representation point as close as possible. 

2. Channel signal power: To minimize the average channel 
power, source vectors with high probability should be mapped 
to channel vectors with low amplitude. 

3. Robustness: To avoid making large errors in the recon- 
structed vectors, vectors that are close in the channel space 
should correspond to vectors close in the source space. The 
opposite, however, is not necessary. 

C. Shannon-Kotel 'nikov mappings 

As mentioned at the start of this introduction, SK-mapping 
is a JSCC approach for merging source- and channel coders 
into one (piecewise continuous) mathematical operation. The 
SK-mappings are operating directly on amplitude continuous 
and discrete time signals, and are applicable to point-to-point 
channels where there are only limited possibilities for feedback 
(supporting channel state information). 

SK-mappings have been shown to perform well with low 
complexity and to be robust against varying channel conditions 
fTfl . El, OH, d, IH, ED, H3- The mappings, when 
found, will also be easy to adapt to varying channel conditions, 
by merely changing a few coefficients in their equations ifTTI . 
ifTTll . Further they seem to have the potential to perform better 
in general than separate systems for a given complexity and 
delay fPH . It is for these reasons of interest to investigate and 
develop SK-mappings further. A general theory is needed. 

The name Shannon-Kotel' nikov mapping was developed 
in two steps. First the name Shannon mapping was chosen 
in lfl8l to honor C. E. Shannon, who presented a geometrical 
view of the communication problem in his 1949 paper |fl9l . 
However, V. A. KoteF nikov had already developed a theory 
for 1:N bandwidth expanding modulation (distributing a scalar 
source on N equal channels) in his doctoral dissertation 1201 . 
dating back to 1947, implying the same type of structures 
as Shannon. And so the name Shannon-Kotel' nikov mapping 
emerged. ETI and 11221 sum up the theoretical analysis 
done by Kotel'nikov in addition to some further analysis. 
SK-mappings have been extended to both M :N dimension 
expansion 04] (M < N) and reduction 03] (M > N) using 
curves and hyper surfaces to map between source and channel 
signals (will be presented in section [TlTl and Ml. 

Some older and well known techniques that can be seen 
as 1:7V SK-mappings exist. Pulse Position Modulation (PPM) 
and Frequency Position Modulation (FPM) are well known 
(seen as SK-mappings for large N). Both have been shown to 
be optimal when the expansion factor N goes to infinity ETl 
666-674] in the sense that no other system have a faster 
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decaying mean squared error. These techniques, however, do 
not perform well for small bandwidth expansion factors TV. 
The reason is that they (ideally) have constant envelope, and 
so do not fill the space properly for low TV (why constant 
envelope systems like PPM can preform close to optimal 
for large TV will be explained in section ITV-Cb . Kotel'nikovs 
results from |20l pp. 62-99] implies that one can use (piece- 
wise) continuous curves to make well-performing mappings 
for all TV. Constructing the right type of curve one can also 
make well performing mappings for small TV iTTTI , which is 
important for applications where bandwidth and power are 
limited. As a special case of this, one can use piecewise 
continuous line segments, such that some sub-channels (of the 
TV-dimensional channel) will have a discrete representation, 
whereas the other sub-channels have a continuous representa- 
tion. These are named Mixed Base Modulation (MBM) l23ll . 
Other known examples that can be seen as SK-mappings are 
the optimal linear systems BPAM l24l . hybride digital analog 
systems (HDA) (25), (26), ||27l which actually can be seen 
as MBM systems (if no channel coders are included in their 
discrete part) and the optimal 1:1 direct mapping mentioned 
in section H-Bl 

In this paper a theory for calculating the distor- 
tion/perfromance using general M :TV SK-mappings (for both 
dimension expansion and compression) is proposed. The pro- 
posed theory is further used to show that SK-mappings have 
the potential to reach OPTA for all dimension/bandwidth ratios 
r € Q+ in the limit of infinite complexity and delay (infinite 
dimensionality). The proposed theory also give general guide- 
lines on how to construct these mappings (although it says 
nothing directly about the global manifold structure). 

The paper is organized as follows. In sectionHIl KoteF nikovs 
theory on bandwidth expanding 1:TV modulation is introduced. 
In section ITTTI KoteF nikovs theory is extended to the M:N 
dimension expanding case. In section HVl we show that dimen- 
sion expanding SK-mappings can reach OPTA, and elaborate 
on the effect of dimensionality increase in such systems. In 
section[V]a theory on M:TV dimension reducing SK-mappings 
is introduced. In section [VI] we show that also dimension 
reducing SK-mappings can reach OPTA. Finally, in section [VlTl 
a discussion is given and conclusions are drawn. 

II. Kotel'nikovs theory on 1:TV bandwidth 

EXPANDING MAPPINGS 

Fig. Q] show a block diagram for the dimension expanding 
communication system under consideration, which can be 
used for illustration in this section and section [TTT] 
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of this paper) from part 3 of Kotel'nikovs dissertation |20l 
pp. 62-99] in which theory for transmission of amplitude 
continuous, discrete time sources are developed (the theory 
of this section is presented in a similar way as in l22l 
pp.287-299]). 

Consider an amplitude continuous, discrete time, scalar 
source x € T> C K. The source is communicated using a 
(parametric) curve in the channel space x i— > s(.t) e5c I*, 
called the signal curve. Let the noise be denoted n G (i.i.d. 
and Gaussian). Then the received signal is s(x) = s(x) + n, 
with the corresponding likelihood function 



h\x{s\x) 



( 1 



N/2 



l|s-s(x)||^ 



(4) 



The maximum likelihood (ML) estimate is defined by (28 



x = max/gi £ (s|x). 



(5) 



As the CSNR gets large, the ML estimate approaches that of 
the optimum estimate (in the mean square sense) ||2T1 pp. 216- 
219]. is maximized by the value x that minimizes the norm 
|| s — s(x)||, implying that the ML estimate of x corresponds 
to the point on the signal curve that is closest to the received 
vector in Euclidian distance. From this Kotel'nikov reasoned 
that there are two different contributions to the distortion of 
the source signal from using such mappings, weak noise which 
is referred to in the following as weak noise distortion and 
anomalous errors which will be referred to as anomalous 
distortion. 

When considering weak noise, the signal curve can be 
approximated in the vicinity of any transmitted signal value 

x by 

s(x) s(x ) + Sq(x - x ), (6) 

assuming that s(x) G C 1 (continuously differentiable with 
respect to x). s ' = ds(x) / dxy =a . . The ML estimate of x can 
be approximated as the projection onto the tangent line through 
the value s(a;o) on the signal curve. Fig. [2] illustrates this. 
Given that the value xq was transmitted, Kotel'nikov showed 




Tangent at 
s(cc ) 



Fig. 1. Block diagram of a general dimension expanding SK-system. 



Fig. 2. ML estimate in the weak noise case. 



This section describes the necessary aspects (for the sake 



that the minimum mean square error (MSE) at xq in the weak 
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noise case is given by 

£ in = E{(x ML - xf\x = x } = OT^Tp - ( ? ) 

o\ is the noise variance and ||s'(xo)|| is the euclidian norm 
of the curve's velocity (tangent) vector at the parameter value 
xq. The weak noise distortion is given by 

4n = E x {e 2 wn (x)} = a 2 n -—L-- f x (x)dx. (8) 

T> is the source domain and f x (x) is the probability density 
function of x. The geometric interpretation of this result is that 
in order to reduce the noise corruption for a given expansion, 
without increasing the signal duration or the transmit energy, 
the signal curve should be made longer by stretching it like a 
rubber band (making the velocity vectors longer). This should 
be done without leaving a certain hyper-sphere in order to 
satisfy a power constraint on the channel. To make the curve as 
long as possible, it has to be folded/twisted inside this hyper- 
sphere. I.e. nonlinear systems are needed in order to close 
in on OPTA, with the exception of very low CSNR (where 
linear systems are adequate) or if an ideal feedback channel 
is available Q. This concept is illustrated in Fig. [3] and Ufa). 
However, the length of the curve cannot be increased beyond a 



noise 

^ Source 

X 1 X 2 




Fig. 4. 1:2 dimension expanding systems. f4ja) The straight line illustrates a 
linear mapping, while the curved line represents a nonlinear mapping. [4f b) A 
nonlinear mapping that has been stretched a significant amount. If stretched 
too far, different parts of the signal curve come too close, n might then 
take the transmitted point so closer to the neighboring curve such that severe 
distortion results when decoding. 



the threshold effect. The gain is in the choice of stretching 
function. The stretching function is a bijective function acting 
on the parameter space, before mapping it onto the parametric 
curve. It will be denoted 92 = tp(x) in the following. By, 
for instance, mapping directly from the parameter x onto the 
curve, one will in some cases (e.g. using a spiral like structure 
like in Fig. @]i get vectors of increasing length for increasing 
values of x. Then the received distortion will depend on the 
level of the source amplitudes, and the tangent vectors will 
be shortest where most of the probability mass of the source 
resides (which is undesirable), ip should be chosen carefully, 
given a specific curve and source pdf. How to find the optimum 
ip for a given pdf, is shown in ll22l pp. 294-297]. 



noise 




Fig. 3. Kotel'nikovs concept of analog source error reduction seen intrinsi- 
cally. 

certain value without introducing anomalous errors (also called 
the threshold effect |fl9l by some authors). These errors are 
large, since they are the result of the channel noise taking 
us from one part of the curve to another. The occurrence of 
these anomalous errors, depends on the relation between the 
standard deviation of the channel noise, and the density of the 
curve. Fig. 2] illustrates. Due to the severity of the anomalous 
errors it is wanted to make them occur with as small a 
probability as possible. Notice that for linear systems there 
is no anomalous distortion, i.e. all noise can be considered as 
non-anomalous (what we have called weak noise distortion for 
the nonlinear mappings). 

There is in addition something to gain in lowering the weak 
noise distortion further without increasing the probability for 



III. M:N Dimension Expanding SK-mappings. 

In this section the theory from section [TT] is generalized to 
include vector sources x£DC M. . This makes it possible to 
analyze more general mappings and to exploit dimensionality 
(letting M, N increase while keeping r constant) for a given 
expansion factor r. Assume that the M source components are 
i.i.d. and Gaussian. The source will be represented through a 
parametric hyper surface in the channel space x 1— > S(x) € 
S C M. N which in the following is referred to as the signal 
hyper surface or just iS. A general M dimensional hyper 
surface imbedded in R N has the following parametric form 

S(x) = [S 1 (x),5 2 (x),...,S w (x)] (9) 

where Si are component functions. 

Using S for communication, the likelihood function of the 
received signal S = S(x) + n is 

/M s w = (2^J e "^^< (10) 

i.e. the ML estimate of x corresponds to the point on S that 
is closest to the received vector in Euclidian distance. 
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Fig. 5. ML estimate in the weak noise case considering vector sources. 



A. Weak noise distortion. 

In the weak noise case, one can consider the hyper surfaces' 
tangent space. Fig. [5] illustrates for the 2:3 case. Assume that 
each component function of S is Si G C 1 (M. AI ), i = 1, ..,N. 
The tangent space at a point xo is given by (first order Taylor 
polynomial) 



S(x) «S(x ) + J(x )(x-x ), 



(ID 



where J is the Jacobian matrix |29l p. 47] (Appendix lAl of S 
at Xo. When using an ML detector, the detected vector will 
be 

S(xml) = S(xo) + Pproj 

n, (12) 



(13) 



where P pro j is a projection matrix given by 1301 p. 158] 

Pproj = Ji^J J*) J = JG J 

G is the metric tensor of S |3T1 pp. 301-347] (see appendix lAl 
for a short description). Using (fTTT i, (fT2l and ( fT3l one can 
easily show that 



J(x M l - Xq) = JG 1 J T n 



(14) 



By multiplying both sides from the left with J T , then G shows 
up on both sides. Since G is positive definite, and so also 
invertible |30l , then 

(x M L-x ) = G- 1 J T n. (15) 

The MSE, given that xo was transmitted, is 

£ ™ n = J] E {( XML ~ x o) T ( x ml - x )}. (16) 

Inserting ( fl5l ) in ( fl6l l one can derive that the weak noise 
distortion given that xo was transmitted is (see Appendix [B] 
for details) 

2 M 



(17) 



which is a natural generalization of (0 {gu is the squared norm 
of the "velocity" vector of S with respect to Xi at a point x ). 
Considering a Gaussian vector source with i.i.d. components 
and i.i.d. Gaussian noise on each sub-channel, the above sum 



is minimized when gu = gjj,\/i,j (a spherical shape should 
be preserved in this case. If gu ^ gjj a spherical region will be 
mapped in to an elliptical region going from channel to source 
at the receiver). The weak noise distortion will be given by 

^ = M4JX)} = |//...^E^/X(x)dx. 

(18) 

This result is a natural generalization of (8), and it states that 
stretching the source space out like a "sheet of rubber" before 
transmission makes the weak noise distortion go down. Notice 
that although the derivation of ( fT~8b is done for C 1 functions, 
one can also use it for piecewise C 1 functions by integrating 
over each surface element, then summing all contributions at 
the end. 

Further one can consider a shape preserving mapping, 
which is to say that every gu of S are equal and independent of 
x (in words: the distance between any vectors of the source are 
equally scaled, not distorted when mapped through S). Then 
the act of S can be seen merely as an amplification factor a 
(from source to channel at the transmitter). In this case (fT8l 
is reduced to the simple expression 



<M± = <. 



M"'a 2 a 2 ' (19) 
Notice that ( TT~8T > says nothing about any possible gain from 
increased dimensionality (this is natural since locally linear 
systems are considered. In linear systems there is nothing 
to gain from dimensionality increase). By increasing the 
dimensionality from e.g. a 1:2 to a 2:4 mapping, and stretching 
an equal and maximum amount in both parameter directions, 
gives the same weak noise MSE as in the 1:2 case. 

B. Anomalous distortion and "sphere hardening" . 

Consider normalized noise vectors in N dimensions n = 
n/y/~N. These vectors have a mean square length equal to 
a\. It is shown in ||2T1 pp. 324-325] that the variance of the 
squared length goes to zero as N — > oo. So as dimension- 
ality increases, the length of the noise vectors will be more 
and more localized around the noise standard deviation, and 
lirriTv^oo ||n|| = a n , which will be referred to as the sphere 
hardening limit. This is a consequence of the the law of large 
numbers. The distribution of the length p = ||n|| is given by 
EH p. 237] 

f P (p)= InCn 6 ^' N - 2 > (20) 

where T(-) is the Gamma function l33l . Fig. [6] shows < [20b for 
some values of N. 

Considering this effect on SK-mappings, one can benefit 
from increasing dimensionality in reducing the probability 
for the anomalous errors, still keeping the same distance 
between parts/folds of S. In an infinite dimensional system 
the anomalous distortion can be avoided by letting the distance 
between folds of S be at least two times the standard deviation 
of the noise. 

Now assume a map S where one can stretch the same 
amount in each parameter direction as was the case for one 
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Fig. 6. The pdf for the norm of the normalized noise vectors for some values 
of N. The noise standard deviation IS (J fi = o.i. 



Assume that both the channel signal and the noise are 
normalized with the channel dimension N. Considering a 
power constrained Gaussian channel, the normalized received 
vector will lie within an N — 1 sphere of radius 



Pn 



N 



(21) 



where P/v is the channel signal power per dimension, and er^ 
the noise variance per dimension. By adding the term 8 one 
takes into consideration that p^ exceeds \J P/v + a\ for finite 
N, so 5 — > as N — > oo (notice that the definition of an 
iV-sphere is § w = {y e M JV+1 |d(y,0) = constant} El p.7], 
where d is the distance from any point y on 8^ to the origin of 
R + . So the well known sphere imbedded in R 3 is denoted 
§ 2 , i.e the "2-sphere"). 

As a starting point the theory presented in ET1 pp. 666- 
674] on 1:N mappings is included, serving as a reference for 
further generalization. 



direction in (HJ. Then according to dT8l >. nothing is lost by 
increasing dimensionality (as long as there are orthogonal 
base vectors in the tangent space of iS). But the structure 
can be "packed" more densely in the channel space due to 
the reduced probability for anomalous errors seen by (f2Qb . 
This yields additional stretching of the source space, and in 
principle one should get closer to OPTA by increasing the 
systems dimensionality. Fig. [7] illustrates this concept. Notice 
that the intersected 2:4 mapping in the figure is just an example 
to show the concept, not an actual 2:4 mapping. One has to 
consider the whole 4-dimensional space to get the true picture. 




Fig. 7. Illustration on how it might be possible to close in on OPTA by 
increasing dimensionality. The green line is illustrating an intersected surface. 
Since the distance between two parts of the 2:4 system is smaller for the same 
anomalous error probability, the mapping can be made a bit "longer" 



IV. THE EFFECT OF DIMENSIONALITY INCREASE ON 
DIMENSION EXPANDING SK-MAPPINGS. 

In this section one approach is taken to by using dimension 
expanding SK-mappings (results presented previously in 1341 ). 



A. Asymptotic analysis on 1:N mappings 

To be able to establish how long a signal curve can be 
made (and thereby how small the weak noise distortion can 
be made) in the limiting case N — > oo, one needs to find 
how large a volume of the constrained channel space the 
signal curve will occupy when anomalous errors should be 
almost absent (avoided in the limit). To this end, the signal 
curve is considered to be the axis of a "hyper cylinder" of 
dimension N — 1 and radius p s (which is a function of the 
noise standard deviation). Arguments in lETl pp. 670-672] 
suggest that this must be the optimal structure. This cylinder 
is placed into the given channel hyper sphere. The noise 
vectors can at every point be decomposed into two statistical 
independent components: N — 1 components normal to the 
curve n a (contributing to the anomalous distortion) and one 
components tangential to the curve n wn (contributing to the 
weak noise distortion). For large N one can give an upper 
bound for the length of the curve. N is assumed to be so large 
that anomalous errors are almost avoided by letting p s > \\n a \\ 
(and 5 ps 0). Let Bn denote the volume contained within an 
(N — 1)- sphere of unit radius (21] pp. 355-357] 



Bn — 



7rT 
(f)' 

N\ 2 



, N even 
,iVodd 



(22) 



The following inequality must be satisfied in order for the 
anomalous errors to be absent, and to fulfill the channel power 
constraint 

LB N _ lP f - 1 < B N p% (23) 



where L is the length of the signal curve. Since N — 1 
normalized noise components are normal to the curve, p s 

II , « y/((/v - 1)/N)al for very large N. Substituting this 
and OH in d23"b we obtain 



LB 



N-l 



N - 1 

IT 1 



<B N \P, 



N 



(24) 
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which limits the length of the curve to 



L < — a 



1 



Pi 



N 



(25) 



A further elaboration on this is given in 12T1 pp. 673-674]. 

B. Asymptotic analysis of M:N expanding SK-mappings 

In this section the result from llV-Al is generalized to include 
vector sources to further show that one can reach OPTA by 
letting the mappings dimensionality grow to infinity. A Gaus- 
sian source with limited Euclidian norm will be distributed 
on an il/-disc or ball. This ball is stretched and twisted by 
S into the channel hyper-sphere. To find the volume of the 
channel space that the transformed source occupy when the 
anomalous errors should be almost avoided, a structure that 
encloses S needs to be found (like in the 1:N case). This 
structure must be of the same dimension as the channel sphere, 
in order to be able to compare their volumes. The structure 
chosen is S x gj-W-M-i (which is a natural generalization of 
the "cylinder" in section II V- Al l. S can be considered to be a 
ball (or M-disc) with a certain radius pu when considering 
shape preserving SK-mappings. gj-^-M-i i s a hyper sphere 
with radius pmn (pmn > ll n a|| at least, to avoid anomalous 
errors). If the channel power constraint is to be satisfied and 
the anomalous errors avoided, the following inequality must 
be obeyed 



OMP M nN-MPMN ^ HnPn- 



(26) 



Again, the noise vectors can at each point (of the surface) be 
decomposed into two statistically independent contributions: 
M components tangential to the signal hyper surface n wn 
(contributing to the weak noise distortion) and N — M compo- 
nents normal to it n a (which are the ones causing anomalous 
errors). Assuming that N is so large that the sphere hardening 
limit (pnm — * \/{N — M)/N er„) can be approximately taken 
into account, d26b turns into 



£>MPm ±S N-M\ — — cr„ 



<B N {P N + o*)*. (27) 



Assuming a shape preserving mapping the stretch and thereby 
the weak noise distortion is determined by the size of the 
radius pu of S. Solving (1241) with respect to p\\ gives 



P%< 



D 



N 



BmB N -m \l-M/N 



1 



N-M n I 1 1" 2 
On V a n 



(28) 



and so the following restriction on the radius of S has to be 
obeyed 



Pm 



m n~~ 
< VBcr n 



1 



1 - M/N 
For both even and odd N 



D 



D 



r 



N 



N-M 



i r 



BatB 



M-DN-M 



rif 



(30) 



This can be shown in a similar way as in Appendix |D] 
substituting the symbols in question. For M = 1 in d29l >. 
Pm = (L/2) compared to d25l l. which shows that ( f29b is a 
generalization of (f25t . ( f30b can be expressed in terms of the 
Beta function using the following relation |[33l p. 9] 

r-l 



t e - l (i-ty- l dt 



mm 



and the Functional relation of the Gamma function 



r(o+ 1) = aT(a). 



(31) 

P- 3] 

(32) 



Letting q = (N - M)/2 + 1 and <r = M/2 + 1 and using the 
above relations gives 

' N \„fN-M M \ ( N 



B = 



. N-M M 

2 -W— +1 'T" 



1 = 



1 B, 



(N.M) • 

(33) 

Since a shape preserving mapping is considered, the weak 
noise distortion is given by ( fT9l , but the decomposition (of n) 
and normalization has to be taken into account. Since M of 
N components of the normalized noise vectors are the ones 
contributing to the weak noise distortion 

_ 2 £{|K n || 2 } Mai 



Pm 



Np 2 M 



> 



M ( N 

N Y + 1 



( N ' M )\1-M/N 



Pn 



(34) 



e wn can be considered as the total distortion Dt of the M:N 
system since the anomalous errors are almost absent when 
Pmn > ll n a|| and M,N are close to infinity. Now assume a 
fixed bandwidth expansion r = N/M. Then M = N/r, and 
so 

r-l 



D t = 
where 



-(i-- 

r \ r 



N 
~2 



■1 



13 



(N,r) 



1- 



Pn 

^•2 



(N,r) 



, (35) 



(36) 



To show that this system can reach OPTA, one needs to show 
that 



lim 

N— »oo 



N 
~2 



1 



(37) 



Using the product rule for limits 11351 p.68],we eliminate the 
first term on the left hand side of d37b since 



2r/N 
1) =1. 



lim ( — 

N^ao \ 2 



So the problem is reduced to show that 

_2r / 1 

lim B IM N , =r 1-- 



(38) 



(39) 



inequality, described in Appendix [C] is used for this 
.Let fit) =t£ ( - r -V(l-t)£ andh= 1 on / = (0,1). 



Holders 

purpose. Let /(t) = t^ r ~ 1 '> and/i = lonJ= (0,1). 
Further let p — oo and q = 1. Clearly both / and h are 
Lebesgue integrable (for N > 0,r > 1 ), and the norms \\f\\oo 
and \\h\\i exist. It is easy to see that \\h\\i = 1. To find ||/||oo 
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the maximum of / must be calculated. Differentiating / with 
respect to t and then equating to zero, we obtain 



t, 



1 

= 1 - -. 

r 



Substitution of this into the expression for / gives, 

l\& (r - 1) /T ' 



B 



(AT, 



-)<ll/lloo= 1-- 



(40) 



(41) 



H/lloo will dominate more and more over the rest of the 
contributions of the integral in d36l l the larger TV gets, and 
when TV — > oo equality in fiTT i is obtained. Raising the right 
hand side of (|4TT > to the power — 2r/TV gives the wanted result. 

The above result does not contain the source variance a x , 
i.e. it is valid for unit variance. a x can easily be included 
by letting pM = aa x , where a is an amplification factor 
(assuming a shape preserving mapping). Solving the new 
equation with respect to a, and substituting for pu in CSK 
gives the wanted result. 

It should be mentioned that the above result will be valid for 
the given fixed parameters, meaning that we have a distinct 
optimal point. If a n increase while pmn is kept constant, 
the system breaks down rapidly, since the probability for 
anomalous errors p rac — > 1. If o~„ decrease while pmn is kept 
constant the packing of S gets non-optimal and the system gets 
further away from OPTA, but this time in a robust manner (like 
a linear system, which is apparent from the derivation of the 
weak noise distortion). 

C. Comments on finite dimensional expanding SK-mappings 

At finite dimensionality the anomalous errors must be taken 
into account in any case, since they will occur with a certain 
small probability. This is because j|nj| will have a nonzero 
variance (if one do not take the fact that the variance of j|nj 
gets larger when TV gets smaller into account, the performance 
predicted in the above model will exceed OPTA when TV 
gets smaller). Assuming a nonzero (small) probability for the 
threshold effect an additional factor must be included in pmn 



Pmn 



N-M 



(42) 



Given a certain probability for anomalous errors, Smn can 
be found from the cdf of (l20l i. substituting TV — M for TV 
(since TV — M components are normal to S). An additional 
factor must also be included in pjy, since the variance of the 
received channel vectors also increases 



Pn 



P N +al+ <5 2 



N' 



(43) 



Inserting the right values for <5jv and Smn an d considering a 
large CSNR (the anomalous distortion is almost absent at the 
optimal point for large CSNR in many cases), the distortion 
at the optimal point is approximately given by the weak noise 
distortion, which in general is 



Di 



TV 
~2 



1 



1 - 



1 



'('t( 1 + ;) + 1 '£ + 1 



(r-l) 



(44) 



N 



It seems to be more to gain in performance from the 
sphere hardening effect the larger r is (until r gets "large", as 
discussed below). Consider the r = 2 versus the r = 3 case. 
In the 1:2 case one will have a curve in a region of the plane, 
while in the 1:3 case one will have a curve in a region of space. 
Given a certain CSNR value, the curve will always be longest 
in the r = 3 case. Assuming that the dimension is increased 
to a 2:4 (r — 2) and 2:6 (r = 3) mappings, sphere hardening 
will take place over the largest "space" (along S) in the r = 3 
case, meaning that there is more to "catch up to" (compared 
to OPTA) by increasing dimensionality in the r = 3 case. This 
can be stated generally as: A larger r gives a larger S for a 
given CSNR, which means that sphere hardening takes place 
over a larger "space" (along S). So one can expect that the 
gap to OPTA (along the SNR axes) will be larger the larger r 
is (this seems to be the case considering the practical 2:3, 1:2, 
and 1:3 mappings proposed in ifTTl pp. 89-93], ifTTl pp. 65- 
66] and IfTTl pp. 50-51] respectively). However, when r gets 
so large that sphere hardening has a significant effect on the 
channel space itself (large TV), constant envelope modulation 
(like PPM) will become more and more well-performing. 
This is because most of the channel signal will be more and 
more located around the sphere defined by the peak power 
constraint. In the limit r — > oo constant envelope modulation 
like PPM and FPM will become optimal. 

V. M :TV Dimension Reducing SK-mappings. 

Fig. [8] shows a block diagram for the dimension reducing 
communication system under consideration. 









h 








Project and 
measure 




ze T> 


SO) 


x = S(z) 






©c R N 


S(z)eS 



Fig. 



Block diagram of a general dimension reducing SK-system. 



In this section ideas from the expansion case in section iHll are 
used as inspiration to develop a theory for M:TV dimension 
reducing SK-mappings. To be able to reduce the dimension 
of the source its information must be reduced (when there 
is a channel power constraint). This necessitates lossy 
compression. Compression is done by approximating the 
source vectors by their projection onto a hyper surface S which 
is an TV dimensional subset of M A/ . This operation is denoted 
q(x) G S C ffi . The dimension is subsequently changed 
from M to TV by a lossless operator d r : S — > T> C M w . 
The total operation is named the projection operation, and 
denoted p = d r o q : x g R A/ ^ p(x) ePC R N . As for 
the expansion case, S is called the signal hyper surface (or 
signal curve in the M:l case). S will be a parametric hyper 
surface with the channel signal z as parameters (exchange x 
with z and TV with M in The point on S, corresponding 
to p(x), is given a convenient representation on the channel 
through an invertible (vector valued) function t. This function 
determines the way distances are measured from the origin 
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of S to the given approximated point. The inverse of £ is 
denoted ip (play a similar role as in the expansion case). The 
vector z = £(p(x)) is transmitted over an AWGN channel 
with noise n g M. N . There will be two contributions to the 
total distortion in this system, approximation distortion (from 
the information reducing projection operation), and channel 
distortion (the effect the channel noise has on the signal when 
mapped through the SK-mapping). These will be described 
further in the following. 

A. Channel distortion 

The received vector z = z + n must be passed through S 
to reconstruct x. It is assumed that each component function 
of S is Si e C l (R N ), i = 1, ...,M (<?(•) is contained in S 
for convenience and without loss of generality). For a given 
transmitted channel vector zo and small deviations due to n, 
the received signal x = S(z) can be approximated by 

S(z + n) wS(z ) + J(z )n, (45) 

where J(zq) is the Jacobian matrix of S evaluated at zo- The 
last term in ((45) is the error due to the channel noise. The MSE 
per source component caused by the channel noise, given that 
z was transmitted is 

^ = ^£{(J(z )n) r (J(z )n)} 

2 



M 



dSi 



dz N 



n N 



(46) 



dS_ 



M 



-Tlx 



dS 



M 



-n N 



dz\ dzN 
Since the noise on each sub-channel is assumed to be inde- 
pendent, E{niTij} = cr^Sij. After some rearrangement, 



M 



dS x 



dS 



M 



(—)' 

\dz N J 



fdS. 



M 



V dz N 



(47) 



M 



1311 + 522 



9nn) 



2 N 



gu (partials with respect to z,) are the diagonal components 
of the metric tensor of S. The channel distortion is given by 



elu = E z {e 2 ch (z)} = j± 



N 



^ 5ii (z)/ z (z)dz. (48) 



gu will increase in magnitude when the signal hyper surface is 
stretched. This means that the more a given surface is stretched 
out in the source space, the larger the channel distortion 
becomes. From this point of view, the surface should be 
stretched as little as possible. This is the opposite of what 
was wanted for the expansion case in section IIII-AI 

Considering a shape preserving mapping (l48l is reduced to 
(for the same reasons as mentioned in section IIII- At 

(49) 



-ch 



where a is the amplification/attenuation from the channel to 
the source at the receiver. 



B. Approximation distortion 

The approximation distortion results from the lossy projec- 
tion operation p. The size of it is linked to the minimum 
distance each source vector has to S. In order to make the 
approximation distortion as small as possible, S has to fill 
the source space as densely as possible (it must be stretched 
and twisted inside a given region). This is in conflict with the 
requirement of reducing the channel distortion, i.e. there is a 
tradeoff between the two distortion contributions. 

Since the approximation distortion is structure dependent, 
one can not expect to find a mathematical expression describ- 
ing it. But to be able to do an asymptotic analysis for reducing 
mappings, some general expression valid for high dimensions 
is needed. By assuming an imbedding of S where at each 
point, the distance to the closest point on a different part of 
S (we do not mean neighborhood) is constant and equal to A 
(uniform structure), a similar analysis as for uniform vector 
quantizers can be used (every centroid having an equal fixed 
distance A between each neighboring centroid). For a uniform 
vector quantizer in m dimensions (m will be related to M and 
N later) the distortion can be lower bounded by assuming 
the decision regions around each centroid to be (m — 1)- 
spherical (36j. Denote the radius of the (m — l)-sphere p m . 
The pdf of the quantization distortion is a uniform spherical 
distribution (a uniform distribution with spherical support, in 
this case of radius A/2) given by (|77T i in Appendix ID! Using 
this distribution one can show that the quantization distortion 
is lower bounded by 



el = E{ P 2 n } = 



-A . 



(50) 



4(m + 2) 

The derivation of d50t is given in Appendix [D] Due to the fact 
that the decision regions become spherical when m — > oo, 
using the right construction ll36ll . d50b will be exact when 
m — > oo (called sphere packing). Notice that this expression 
differs from the well known distortion lower bound derived by 
Gersho in |36l . The reason for this is that Gersho's distortion 
expression is scaling invariant (independent of the size of 
the cells containing the centroids), whereas here we want the 
distortion to depend on the size of the cells so it can be made 
dependent on the CSNR. 

The expression in < f50b must be modified to take into 
account an approximation to a general iV-dimensional signal 
hyper surface. The question is how. Consider first a uniform 
2:1 system (uniform spacing between any two parts of it) 
consisting of concentric circles around the origin in M 2 with a 
radial distance A between them. The p operator will introduce 
an approximation distortion equivalent to a scalar quantizer, 
except that it will be scaled by 1/2 (distributed between two 
source components). I.e. e\ = A 2 /24. Furthermore, consider a 
uniform 3:2 mapping consisting of concentric spheres around 
the origin in M 3 with radial distance A between them. This is 
again equivalent to a scalar quantizer, except that the distortion 
is scaled by 1/3. Now consider a 3:1 mapping consisting of 
circles of different radii (nA for n £ N) lying on parallel 
planes in R 3 with an equal distance A between them (filling 
out a ball like region in the source space). Now p introduces 
an approximation distortion equivalent to a 2D uniform vector 
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quantizer, except that it will be scaled by 1/3. The same can 
be said about a similarly constructed 4:2 mapping consisting 
of concentric spheres, except that the distortion will be scaled 
by 1/4. From this one can reason that the approximation dis- 
tortion in general can be approximated by d50b by substituting 
m = M — N and scaling by 1/M (distortion per source 
dimension). I.e. the approximation distortion will be given by 



M-N 



AM(M - N + 2) 



A 



(51) 



for large M, N. As for for vector quantizers, ( |5TT > serves 
as a lower bound, and becomes the actual distortion when 
M, N — > oo if the right construction (uniform imbedding) is 
chosen. Notice that in (fSTb . A must be a function of the CSNR 
for the expression to be analyzed properly. 

VI. The effect of dimensionality increase in 

DIMENSION REDUCING SK-MAPPINGS. 

In this section it is shown that also M:N dimension reduc- 
ing systems can close in on OPTA by letting M, N — > oo. 

As mentioned in section IV-BI there is a tradeoff between 
minimization of the channel distortion and the approximation 
distortion. To show that it is possible to reach OPTA one has 
to consider how much the signal surface S must be stretched 
in order to cover the source space as properly as possible for 
a given CSNR without letting the channel distortion become 
too large. This should done in a general manner without 
reference to a specific hyper surface. Again one solution is 
to consider volumes. To find out how much space the signal 
surface will occupy for a certain approximation distortion, it 
needs to be enclosed in an entity that has the same dimension 
as the source sphere (to be able to compare the volumes). The 
entity chosen is S x § M " Ar ~ 1 . 5 is a ball-like structure with 
radius p^ (considering a shape preserving mapping), while 
gM-JV-i j s a hyper-sphere with radius pmn- At all points the 
distance between the two closest points of any part of S (not 
meaning neighborhood) is kept constant and equal to A, and so 
Pmn = A/2. The reason why this uniform covering is chosen 
is that ( [511 can be used as an expression for the approximation 
distortion (M,N large). Notice that in this section (compared 
to section HV-Bb we do not normalize the source and channel 
signals with respect to either the source- or channel dimension 
(more convenient in this case). 

To make the approximation distortion as small as possible, 
S x § A/ - W -! should cover the entire source space (for large 
M and N, and in general the space which has a significant 
probability associated with it), i.e. the following inequality 
must be satisfied 



BnPnBm-nPmn" ^ BmPm 



M-N 



(52) 



5m) is the radius of the source-space 



PM = ||xj| = y/A/(ag 
and pn = a \J N(Pn + a\ + 5n) is the radius of the channel 
space, where a is an amplification factor, P/v is the channel 
power per channel dimension, and er 2 is the noise variance per 
channel dimension. 6 m and Sn are included to take the sphere 
hardening effect into account, so 6m — > as M — > oo and 



<5 at — > as N — > oo. Inserting the above in (l52l and solving 
with respect to a, we obtain 

/ 77 M-N 1 

Pn 



where 



N M 

Ox ^(1, ff2 



D 



D 



M 



M-N 



'I f + 1 



Bm-nBn 



(53) 



(54) 



The last equality in d54l can be shown by using the expression 
for the unit radius hyper spheres given in (1221 and a similar 
derivation as in Appendix iDl A shape preserving mapping is 
assumed. Then by inserting ( 1531 in d49l , a general expression 
for the channel distortion is found as 



M i ~ 2 / A 



(55) 



We assume that the approximation distortion and the channel 
distortion are independent (a matter of construction), and so 
the total distortion will be given by the sum 



Dt = il + £ 2 ch 



M-N a9 m , - 2 / A 

= , -A 2 + M^- l B* — 

AM(M - N + 2) V 2 

(56) 

Now the optimal A needs to be determined. Differentiat- 
ing (TS6b with respect to A, equating to zero and solving for 
A, we obtain 



A opt = 



m-n {4M(M - N + 2) ^ 2M 

Pn^~ 



V M-N 

M -N\* N 



(57) 



N 



2 1 "Mi?- cr I 1 



d54l can be expressed in terms of the Beta function using OH 
and ([32]). Letting q = (M - N)/2 + 1 and <; = N/2 + 1 then 

(M \„(M-N N \ (M \ 
B= [ — + 1 |B( — + 1,— + 1 I = (_ + l)s (MJV) . 



2 J \ 2 '2 
Inserting d57| i and d58l l in d56b gives 

D t =( 1 



(58) 



JV 



M-N 



M — NJ\M — N + 2 



M-N 
N 



M \ M JL 

Y + 1 ) ZfL.Nfll 1 



Pn 

nr2 



(59) 

Further assume a fixed dimension reduction r = N/M, then 
N = Mr. Substitution in to ( 1591 gives 

Dt = ( 1 + JL-)( V" 

V 1 - r I \ 1 -r + 2/M I 

. (60) 



1 -r 
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Since 



, M 

lim hi) =1 and 

M^oo \ 2 



1 - r 



M— »oo \1 -r + 2/M 



(61) 



1 



and using the product rule for limits [35 p. 68], we have left 
to show that 

lim ( 1 + ( —) Bf M ir) - 1, (62) 



m^oo \ 1 — r I V r 



where 



S(M,r)= / t TH (l-t)^& (63) 
Jo 

Doing a similar analysis as in section HV-BI one will find that 

B {M , r) <{l-r)^-^, (64) 
with equality when A I — > oo. Then 



1 



1-Jl r / ( A1 » 
r \ / 1 — r 



(65) 



(l_ r )(l-r) r r = ! 



1 — r ) \ r 

when M — * oo, which is what we wanted to show. 

VII. Summary and Discussion 

In this paper a theory for SK-mappings has been introduced. 
This theory has further been used to show that the SK- 
mappings have the potential to reach OPTA for any dimension 
change r € Q+ as the dimensionality grows to infinity. 

First a theory for dimension expanding mappings was 
developed, showing that the overall objective of constructing 
expanding mappings is to find a structure S (a signal hyper 
surface representing the source on the channel), which fill a 
power constrained region on the channel as properly as possi- 
ble, by stretching it out as a "sheet of rubber" (minimizing 
the weak noise distortion). This should be done while the 
distance between any to folds/parts of S are kept as large 
as possible, in order to minimize the effect of the anomalous 
distortion. This gives a tradeoff between the two distortion 
contributions, and thus a unique minimum distortion for a 
given CSNR. This tradeoff is similar to the one in traditional 
channel coding, where it is desired to place as many codewords 
as possible into a constrained region (to make the rate as large 
as possible), and at the same time have as large a distance 
between them as possible (to minimize the probability of 
exchanging codewords), as illustrated in (3| pp. 242-243]. 

The theory developed was used to show that OPTA can 
be reached in the limit of infinite dimensionality by using a 
uniform (fixed distance between any folds of S) and shape 
preserving (a diagonal metric tensor with constant elements) 
structure. This is not necessarily optimal in the finite dimen- 
sional case (especially for low dimensions) which is illustrated 
in El pp. 294-297] and fP71 . 

Furthermore, a theory for dimension reducing systems was 
introduced, showing that the overall objective in constructing 
reducing systems is to choose a structure S (representing the 



channel signal in the source space) that cover the source space 
as properly as possible by keeping the distance between any 
folds of it as small as possible, in order to minimize the 
approximation distortion. But on the other hand S should 
be stretched out as little as possible to minimize the channel 
distortion. This gives a tradeoff between two distortion con- 
tributions also for reducing mappings. The tradeoff is quite 
similar to the one in lossy source coding, where it is desired 
to cover the source space as properly as possible (to minimize 
the distortion) with as few representation vectors as possible 
(to minimize the rate), as illustrated in [3] pp. 357-358]. 

Further it was shown, using the theory developed, that 
also dimension reducing systems can reach OPTA when the 
dimensionality grows to infinity, again using uniform shape 
preserving mappings. 

Further research should aim at finding methods for de- 
termining the optimal global geometrical structure given the 
source and channel pdf's and their dimensions. Hopefully the 
theory introduced in this paper can be extended or modified 
to some sort of variational calculus problem 11371 . If it is 
possible to find such differential equations, they will probably 
be solvable analytically for low dimensional spaces and well 
behaved pdf's only. 

The proposed theory might also be generalized to a 
more general network setting (e.g. something along the lines 
of (38]), being an alternative specifically in some cases where 
no separation theorem can be provided. 

Appendix A 
The Metric Tensor 

See HQ pp.301-347] or (39] pp. 43-53] for definition and 
more involved details. 

Consider an imbedding of an M-manifold S given by the 
parametric equation 



S(x) = [5i(x),S 2 (x),.-. ,5jv(x)] 



(66) 



where 5, are component functions. The metric tensor for a 
smooth imbedding of S in M. N (M < N) is given by: 



G = J L J = 



where J is the Jacobian 



9n 

521 



512 
522 



5im 

52M 

9mm, 



J = 



9mi 9M2 ■ 

p. 47] of S, given by 

T 



(67) 



- ds\ 


ds 2 


ds N 




dxi 


dxi 


dsi 


ds2 


ds N 


8X2 


dx2 


0X2 


dsi 


ds2 


ds N 


_9xm 


dx M 


dx M 



(68) 



The metric tensor G is symmetric and positive definite l40l 
pp. 208-209]. gu can be interpreted as the squared length of 
the tangent vector in the direction of parameter Xi, where Xi is 
the i'th parameter in a parametric description of S. All "cross 
terms" g^, are the inner product of the tangent vectors in the 
direction of x; and a;,. 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY, APRIL 2009 



12 



Appendix B 
Derivation of the weak noise distortion 

Here we show that 

1 2 M i 



(69) 



gives the smallest possible weak noise distortion. G and J 
are described in Appendix [A] To simplify the analysis (so 
the matrix multiplications can be avoided) the iV-dimensional 
noise vector n is replaced, without loss of generality, by its M 
dimensional projection np, which will also be Gaussian (since 
Pproj is a linear transformation l30l p. 117]). Let J = J(xq). 
Further it is assume that a hypothetical inverse B = J -1 exists 
(which is also the case when the analysis is restricted to the M 
dimensional tangent space). Let St denote the tangent hyper 
plane of S at Xo. Further, let the inverse of S be denoted S _1 . 
Considering weak noise, the linear approximation of S _1 can 
be used. Taking all the above into account, the received vector 
will be given by 



x = S- 1 (S t (x ) + n P ) 
w S-^StCxo)) + Bn P = x + Bn P , 
and so, the MSE given that xo was transmitted is 

4n = ^£{npB T Bnp} 

M M 



(70) 



(71) 



= 1 3=1 



where b; is column vector no. i in B. Since the noise is 
considered i.i.d. each component of np is independent, so 
E{riinj} = <j\8ij and (|7TT i is reduced to 



1 



— £{n P B J Bn P } 



2 M 

M ^ 1 



i=l 



2 M 

M ^ 

i=l 



(72) 



Since B = J -1 , and it is well known that a matrix with 
orthogonal columns has an inverse, the above result tells us 
that there is nothing to gain by choosing a nonorthogonal basis 
in the tangent space of iS, i.e. the basis can always be chosen 
orthogonal. This simplifies both the analysis and the system 
itself. Making the Jacobian orthogonal will make the metric 
tensor diagonal. Therefore G _1 is also diagonal with diagonal 
elements l/gu. Now consider d69l again. Using the fact that 
G~ 2 is diagonal, E{niUj} = cr^Sij and with d72l in mind one 
can easily derive the following 

sin = ±E{(G-iJ T n) T (G-ijTn)} 



1 



M 



M 

o M 



E{(J T n) T G- 2 (J T n)} 



M ^ al 



I./, II-' 



(73) 



where Ji is column vector no. i of J and II JJ| 2 



Appendix C 
Holders inequality 

Lemma 1: gl] Assume that / G L P (I) (\f\ p is Lebesgue 
integrable on the interval Jgl) and h G L q (I), where - + 
- = 1, then 

9 



J, 



\f(t)h(t)\dt< \\f\\ p \\h\\ q 



(74) 



Proof 1: SeeEU p. 135-136] 

Appendix D 
Quantization distortion lower bound 

A uniform spherical distribution can be found by integrating 
a constant over a spherical region of R m and equate to 
one. The integration is done in generalized spherical coor- 
dinates 



tt fl-K r A/2 m—1 

np" 1 - 1 TT S m(e k ) m - x - k dpde k = 1. 

o Jo Jo Jo 



k=l 



(75) 



This integral equals the volume of an m-sphere with radius 
A/2 scaled by the constant k 



7T ~2~ I A\ m 



Vf = K 



2 m 7r" 



ii ! 



A V m 



(f) 



, m odd 



Using the relation in (1321 . then for even m 

7TTT A™ A m 



2™f(f-l)! 2" l - 1 mr(f) 



(76) 



For odd m, things get more involved. Using Legendre's 
duplication formula 11331 p. 5] one can easily derive that we 
get ( l76l l also for this case. And so the pdf of the quantization 
distortion is given by 



m2" 



: T(f) ,p€ [O,A/2],V0i 
elsewhere. 



(77) 



Assuming a uniform source and equal distance between each 
neighboring cell and one centroid at the origin, the quantiza- 
tion distortion can be found by 



-it r 2n pA/2 



o Jo Jo 



I wn.(6 k ) m - 1 - h dpM k . 



(78) 



k=l 



The innermost integral in d75T l is 



A /' 2 , 1 /A 
p m - 1 dr = -(- 
to \ 2 



(79) 
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and since the contributions from the other integrals will cancel 
out with f p .e 



1 /A 



£a to v 2 
1 /A 



A/2 



P 2 f p , @ (p,e)p m - 1 dp 

m+2 



to V 2 / m + 2 V 2 
m -A 2 



(80) 



4(m + 2) 
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